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^ ; ABSTRACT 

3 : 

. The halo occupation distribution (HOD) describes the relation between galaxies 

I and dark matter at the level of individual dark matter halos. The properties of galaxies 

I residing at the centers of halos differ from those of satellite galaxies because of differ- 

I ences in their formation histories. Using a smoothed particle hydrodynamics (SPH) 

Ti^lj- ' simulation and a semi-analytic (SA) galaxy formation model, we examine the sepa- 

. rate contributions of central and satellite galaxies to the HOD, more specifically to the 

QQ ■ probability P{N\M) that a halo of virial mass M contains N galaxies of a particular 

, class. In agreement with earlier results for dark matter subhalos, we find that the mean 

Q ! occupation function {N)m for galaxies above a baryonic mass threshold can be approx- 

fH I imated by a step function for central galaxies plus a power law for satellites, and that 

Qh| the distribution of satellite numbers is close to Poisson at fixed halo mass. Since the 

O ' number of central galaxies is always zero or one, the width of P{N\M) is narrower than 

' a Poisson distribution at low N and approaches Poisson at high A^. For galaxy samples 

d ■ defined by different baryonic mass thresholds, there is a nearly linear relation between 

J> . the minimum halo mass Mmin required to host a central galaxy and the mass Mi at 

^ , which an average halo hosts one satellite, with Mi ~ 14Mmin (SPH) or Mi ~ ISMmin 

^ I (SA). The stellar population age of central galaxies correlates with halo mass, and this 
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correlation explains much of the age-dependence of the galaxy HOD. The mean occu- 
pation number of yoTing galaxies exhibits a local minimum at M ~ lOMmin where halos 
are too massive to host a young central galaxy but not massive enough to host satellites. 
Using the SA model, we show that the conditional galaxy mass function at fixed halo 
mass cannot be described by a Schechter function because central galaxies produce a 
"bump" at high masses. We suggest parameterizations for the HOD and the conditional 
luminosity function that can be used to model observed galaxy clustering. Many of our 
predictions are in good agreement with recent results inferred from clustering in the 
Sloan Digital Sky Survey. 

Subject headings: cosmology:thcory — galaxies:formation — galaxies:halos — large- 
scale structure of universe 



1. Introduction 

The Halo Occupation Distribution (HOD) has emerged as a powerful framework for describing 
galaxy bias and modeling galaxy clustering (e.g., Ma & Fry 2000; Peacock &; Smith 2000; Seljak 
2000; Scoccimarro ct al. 2001; Berlind & Weinberg 2002). It characterizes the bias between galaxies 
and mass in terms of the probability distribution P{N\M) that a halo of virial mass M contains 
N galaxies of a given type, together with relative spatial and velocity distributions of galaxies 
and dark matter within halos. The HOD as a function of galaxy type (defined by luminosity, 
color, morphology, etc.) is a fundamental prediction of galaxy formation theory (e.g., Kauffmann, 
Nusscr, & Steinmetz 1997; Kauffmann et al. 1999; Benson et al. 2000; White, Hernquist, & Springel 
2001; Yoshikawa et al. 2001; Berlind et al. 2003; Kravtsov ct al. 2004), which can be tested by 
deriving the HOD empirically from observed clustering. Berlind ct al. (2003, hereafter BOS) showed 
good agreement between the predictions of smoothed particle hydrodynamics (SPH) simulations 
and semi-analytic (SA) calculations for the same cosmological model, and Kravtsov et al. (2004, 
hereafter K04) showed that the HOD of substructures in high resolution N-body simulations has 
many of the same features. In this paper we extend the analysis of B03 by separately examining 
the contributions of central and satellite galaxies to the HOD, following the approach of K04. 

The central galaxies of halos arc treated distinctly in SA models of galaxy formation (e.g.. 
White & Frcnk 1991; Kauffmann, White, & Guideroni 1993; Cole ct al. 1994; Avila-Reese, Firmani, 
& Hernandez 1998; van Kampcn, Jimenez, & Peacock 1999; Somcrville & Primack 1999). Central 
galaxies accrete most or all of a halo's cooling gas. In a merger between two halos, the most massive 
progenitor galaxy becomes the central galaxy of the merged halo, and other, satellite galaxies may 
merge with it after being dragged in by dynamical friction. In hydrodynamic simulations (e.g., Cen 
&; Ostriker 1992; Katz, Hernquist, & Weinberg 1992; Evrard, Summers, & Davis 1994; Pearce et 
al. 1999; White, Hernquist, & Springel 2001; Yoshikawa et al. 2001) there is no explicitly separate 
treatment of central and satellite galaxies, but the central objects of halos emerge as a distinct 
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class, more massive and usually older than other galaxies in the same halo (BOS), in qualitative 
agreement with SA predictions. The central galaxies in SPH simulations lie close to the most 
strongly bound dark matter particles and move slowly relative to the halo center-of-mass, while 
the radial profiles and velocity dispersions of satellites are similar to those of dark matter. BOS 
show that SPH and SA predictions of P{N\M) are remarkably similar provided one chooses mass 
thresholds that yield the same galaxy number density. They further show that the two methods 
predict similar dependences of P{N\M) on galaxy baryonic mass and stellar population age. 

K04 show that the description of P{N\M) for subhalos in N-body simulations simplifies con- 
siderably if one distinguishes the contributions of central and satellite substructures. (We will use 
the terms substructure and subhalo interchangeably.) For a subhalo sample limited by maximum 
circular velocity, the mean occupation function {N)m = ^'n=o NP{N\M) is well approximated by 
the sum of a step function for the central substructure, (A^cen)M = &{M — Mmin)) where Q{x) = 
if X < and Q{x) = 1 if a; > 0, and a power-law for the satellites, {Nsat)M = (M/Mi)'^, with 
a 1. The resulting shape, with a cutoff, a plateau at low M, and a power law at high M, is 
similar to that found for SPH and SA model galaxies (BOS) . More importantly, if one assumes that 
the distribution of A^sat with respect to the mean (Asat)Af is a Poisson distribution, then the model 
naturally explains the transition from a sub-Poisson width at low (N) m (where the contribution 
of the central galaxy dominates) to a Poisson width at high {N)m (where the satellites dominate). 
This transition is a common feature in all of the SA and SPH calculations mentioned above, and the 
sub-Poisson fluctuations at low {N)m play a crucial role in shaping the galaxy 2-point correlation 
function (Benson ct al. 2000). Guzik & Seljak (2002) also distinguish central and satellite galaxies 
in their HOD modeling of galaxy-galaxy lensing, but they focus on samples defined by bins of 
luminosity (rather than thresholds) and therefore adopt a different parameterization. They show 
that the SA models of Kauffmann et al. (1999) predict a {N)m for a narrow luminosity bin that is 
well approximated by a Gaussian in log M for central galaxies and a power law for satellites. 

Here we apply the K04 approach to the SPH and SA galaxy populations studied by BOS. We 
consider samples limited by thresholds in baryonic mass, which should be similar to observational 
samples limited by luminosity thresholds, particularly for observations at longer wavelengths. We 
also examine samples divided on the basis of stellar population age, which should be comparable 
to observational samples divided by color or spectral type. Distinguishing central and satellite 
galaxies proves especially valuable in understanding the behavior of these "red" and "blue" galaxy 
HODs. We present simple parameterizations of the predicted HODs, which can guide efforts to infer 
the HOD from observed galaxy clustering. We also present results for the conditional luminosity 
function of galaxies at fixed halo mass (Yang, Mo, &; van den Bosch 2003; van den Bosch, Yang, 
&; Mo 2003a; van den Bosch, Mo, & Yang 200Sc). Our predictions can be tested by analyses of 
large galaxy redshift surveys like the Two Degree Field Galaxy Redshift Survey (2dFGRS; Colless 
et al. 2001) and the Sloan Digital Sky Survey (SDSS; York et al. 2000). Several of our qualitative 
predictions are in good agreement with recent analyses of these surveys, in particular with Zehavi et 
al.'s (2005) study of the luminosity and color dependence of the SDSS galaxy correlation function. 
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as we discuss in §5. 

2. Theoretical Models 

We use the same SPH simulation and SA galaxy formation model as BOS. We review these 
calculations briefly here and refer the reader to BOS for more details. 

The SPH simulation assumes a cosmological model with Jl^ = 0.4, = 0.6, Qi, = 0.02/i~^, 

h = i?o/(100kms-^Mpc-^) = 0.65, n = 0.95, and ag = 0.8 (see descriptions by Murali at al. 2002; 
Dave, Katz, & Weinberg 2002; Weinberg ct al. 2004). It follows the evolution of 144''^ gas and 144''^ 
dark matter particles in a box of 50/i~^Mpc on each side from z = 49 to z = 0. The softening length 
of the gravitational force is 7/i~-'^kpc comoving. The simulation incorporates radiative and Compton 
cooling and phenomenological prescriptions for star formation and supernova feedback. Galaxies 
are identified as gravitationally bound groups of star and cold gas particles that are associated with 
a common local maximum in the baryon density. Dark matter halos are identified using a friends- 
of-friends algorithm (Davis et al. 1985) with a linking length of 0.173 times the mean interparticle 
separation. Each galaxy is assigned to the halo that contains the dark matter particle closest to 
the galaxy center of mass. In each halo, the galaxy whose center is closest to the position of the 
most bound dark matter particle (defined as the halo "center") is tagged as the "central" galaxy, 
while others are regarded as satellite galaxies. 

The SA model used by BOS is GALFORM (Cole et al. 2000). For a given halo, the model 
generates a "merger tree" using a Monte Carlo method, starting at z = and branching into 
progenitor halos until it reaches a starting redshift. Then the model works forward in time to follow 
the formation and evolution of galaxies in each progenitor halo. Phenomenological prescriptions 
are used to model star formation and feedback, dynamical friction within halos, and mergers of 
galaxies. There is always a galaxy residing at the center of each halo. Before a halo experiences 
a major merger, cooling gas is assumed to accrete onto the disk of this galaxy and form stars. 
If two halos merge, the most massive galaxy is set to become the central galaxy of the merged 
halo, and any other galaxies become satellites. If two galaxies of comparable mass merge, then 
all their stars (disk+spheroid) form the spheroid of the remnant, which may regrow a new disk 
by subsequent gas accretion. Some adjustable parameters of this model are chosen on the basis of 
observed properties of the local galaxy population, most notably the galaxy luminosity function. 
However, no parameters are adjusted to reproduce observed galaxy clustering or the SPH results. 
The model uses the same cosmological parameters as the SPH simulation, and it is supplied with 
the same halo population identified in the SPH simulation so that the HOD predictions of these 
two models can be compared halo by halo. To improve statistics, ten SA realizations are conducted 
for each of the 70 most massive SPH halos, and forty realizations are done for less massive halos 
in each mass bin of width AlogM = 0.1. 
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3. Halo Occupation Distributions 

Galaxy samples of different space densities are constructed by choosing galaxies above different 
baryonic mass thresholds. The baryonic mass resolution limit of the SPH simulation (corresponding 
to 64 SPH particles) is 5.42 x lO^'^M©, and the space density of simulated galaxies above this 
threshold is fig = 0.02/i^Mpc~^. The baryonic mass threshold that yields the same space density in 
the SA model is 1.45 x lO^^M©, lower because of the suppressed gas cooling and enhanced supernova 
feedback (see B03 for further discussion). In §3.1 we focus on this sample with fig = 0.02/i^Mpc~^, 
which is the largest one we can create from the SPH simulation. It should correspond roughly to 
an observational sample defined by a luminosity threshold = —18.6 (0.18L,,), where we have 
used the Blanton et al. (2003) luminosity function to find the luminosity threshold that yields 
the same comoving space density. Scatter in stellar mass-to-light ratios makes mass-threshold and 
luminosity-threshold samples different, but the substantial variations of these mass-to-light ratios 
across the galaxy population are largely a systematic function of luminosity (Kauffmann et al. 
2003), with only moderate scatter at fixed L. 

In §3.2, we divide this sample into two equal parts on the basis of stellar population age — 
the median lookback time (when half the stars had formed) in the case of the SPH simulation 
and the mean lookback time in the case of the SA model. The SA code automatically computes 
the mass-weighted mean stellar age, but it does not produce the step-by-step stellar mass record 
needed to compute the median stellar age a posteriori. The SPH code does produce this record, 
making the median age easier to compute; this quantity is probably more robust than the mean 
age in the SPH simulations because star formation rates are underestimated in a galaxy's early 
history, when it is near the mass resolution threshold. Our analysis uses only the rank order of ages 
not the absolute values, so we expect the difference in age definitions to have minimal impact on 
our results. Population age should be a fairly good proxy for galaxy color or spectral type, so this 
division mimics the red/blue or early/late divisions studied by Zchavi et al. (2002; 2005) in the 
SDSS and Norberg et al. (2002) and Madgwick et al. (2003) in the 2dFGRS. In §3.3, we consider 
samples of progressively higher baryonic mass threshold and lower mean space density, to investigate 
the predicted luminosity dependence of the HOD. Although we present both SPH and SA results 
throughout this section, the SA model has better statistics because of the multiple realizations of 
each halo, and it thus allows us to investigate some finer points of the HOD predictions. 



3.1. HOD for All Galaxies 

Figure 1 shows mean occupation numbers as a function of halo mass predicted by the SPH 
simulation and by the SA model, for the fig = 0.02/i^Mpc~'^ samples. Mean occupation functions 
for central and satellite galaxies are similar to those found for subhalos by K04 (sec their Figure 4). 
The total mean occupation function is the sum of a step-like function representing the contribution 
of central galaxies and a power-law-like function representing the contribution of satellite galaxies. 
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If each sample is constructed by first selecting dark matter halos above a minimum mass, the 
mean occupation function {Nccn)M of central "subhalos" (really the halos themselves) would be a 
strict step function, since A'^ccn = below the minimum mass and A'^ccn = 1 above it. However, our 
samples are based on minimum galaxy baryonic masses, so scatter in the relation between baryonic 
mass of the central galaxy and virial mass of the halo smooths the step, with (A'cen)M increasing 
from 0.1 to 0.9 over a factor ~ 2-3 in halo mass. Since a halo necessarily contains either zero or 
one central galaxies, the probability distribution P{Ncen\{^ccn)M) is a nearest-integer distribution 
(more technically, a Bernoulli distribution), with P(l) = 1 — P(0) = {Ncen)M- 

The mean occupation function of satellite galaxies in both the SPH and SA calculations is 
roughly a power law, {Nsa.t)M oc M", though it tails off more rapidly at masses where (A^sat)M < 1- 
In fact, the power law index is a ~ 1 in both cases, implying a simple proportionality between halo 
mass and satellite number. The mean number of galaxy pairs in a halo, {N{N — 1)), is important 
for the small scale behavior of the galaxy two-point correlation function. Upper panels of Figure 1 
plot the quantity {N{N - l))V2/(Ar), as a function of halo mass. If N is Poisson distributed, 
then this quantity is unity. Satellite galaxies have {N{N — l))^/'^/{N) ^ 1 for all masses where 
{Nsat)M > 1 (open circles). In the regime where (A'sat)M ~ 1, the SA model seems to predict 
a value of {N{N - l))^/'^ / {N) slightly lower than unity. The width of the total galaxy P{N\M) 
(filled circles) is close to Poisson at high masses, where satellite galaxies dominate the occupation 
number, but it is substantially sub-Poisson {{N{N — 1))^/'^ /{N) < 1) at low masses because the 
nearest-integer distribution for central galaxies is narrower, with Ncen{Ncen — 1) = 0. Our findings 
that satellite galaxies have a power-law-like {Nsa,t)M with a f» 1, that satellite numbers follow a 
roughly Poisson distribution with respect to their mean, and that the sub-Poisson fluctuations of 
the total galaxy P{N\M) are thus a consequence of central galaxies, are all in agreement with the 
findings of K04 for dark matter subhalos. 

To test how closely the probability distribution of satellite occupation number follows a Poisson 
distribution. Figure 2 plots -P(iVsat|(A'sat)M) at different values of {Nsa.t)M (points), in comparison 
to a Poisson distribution that has the same mean (dotted histogram). For both the SPH and the 
SA models, the typical bin width of halo mass is AlogM=0.4, but it is set to be 0.1 for the SA 
model if (A^sat)M < 2. As a result of multiple realizations, the SA model gives better statistics 
than the SPH simulation, especially for high occupation numbers that correspond to rare, massive 
halos. Nevertheless, both models predict that P(A''satKA''sat)M) is impressively close to a Poisson 
distribution over the full range (A'sat)M = 0.5 to (A^sat)M = 15 where we have adequate statistics. 
While visual inspection of Figure 2 suggests that the match is not exact (we have not carried out a 
formal statistical test), it shows that the Poisson approximation for -P(A?sat|(A^sat)M) is likely to be 
adequate for most predictions of galaxy clustering statistics, such as counts-in-cells distributions 
and higher order correlation functions. 

To model two-point correlation functions of SDSS galaxies, Zehavi et al. (2005) parameterize 
the mean occupation function for galaxies brighter than a luminosity threshold as a step function 
for (Acen)M plus a truncated power law (Asat)M = (M/Mi)" for satellites. This simple model has 
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three free parameters: the minimum halo mass Mmin below which N^en = -^sat = and the slope 
a and normalization Mi of the power law. For a given cosmology, two parameters can be adjusted 
to fit the observed correlation function while the third (usually Mmin) is set by matching the mean 
galaxy number density of the sample. Figure 1 shows that this parameterization captures the main 
features of the theoretically predicted HOD. It has the benefit of having the same number of free 
parameters as a power law (since one parameter is fixed by the galaxy number density) , allowing a 
fair comparison of goodncss-of-fit to observed correlation functions. However, as measurements of 
complementary clustering statistics become available, we can afford to fit HOD models with more 
free parameters, and they may even become necessary to match the data. For example, with the 
high mass end of {N)m constrained by the group multiplicity function (e.g., Peacock & Smith 2000; 
Marinoni Sz Hudson 2002; Kochanek et al. 2003; Yang et al 2005b; A. Berlind, et al., in preparation), 
we can use the correlation function and other clustering statistics to investigate the cutoff profile of 
the mean occupation function. Therefore, it is useful to more accurately parameterize the results 
from the theoretical models with slightly more complicated prescriptions. Ultimately, we would like 
to fit observations using a model that does not rely on a theoretically predicted form, but in the near 
term we can use the predicted form and compare observationally inferred and theoretically predicted 
parameter values. It is worth noting that if the adopted cosmological model is substantially wrong 
then no choice of HOD can match the full range of galaxy clustering statistics (Z. Zheng Sz D. 
Weinberg, in preparation; see van den Bosch, Mo, Sz Yang [2003c] for closely related arguments 
using the conditional luminosity function) . 

For both the SPH and SA models, we find that the mean occupation function for central 
galaxies can be well represented by 



The parameters are Mmin, the characteristic minimum mass of halos that can host such central 
galaxies, and aiog the characteristic transition width. This functional form corresponds to a 
Gaussian distribution of log Mgai, at fixed halo mass M. If the mean baryonic mass is (Mgai) oc iW* 



mass at fixed halo mass. For a sample defined by a luminosity threshold instead of a baryonic mass 
threshold, the halo mass dispersion cJiog m will be somewhat larger because of the scatter in stellar 
mass-to-light ratios, and the scatter may deviate from the Gaussian distribution. The observed 
Tully-Fisher (1977) relation (roughly (L) oc M, a\ogL ~ 0.15) suggests cr\og m ~ 0.15 in the mass 
range M ~ W^'^Mq corresponding to typical bright spirals. (All logarithms in this paper are base 
10.) A detailed discussion of the scatter in the mass-luminosity relation and its effects can be found 
in Tasitsiomi et al. (2004). 




(1) 



where erf () is the error function 




(2) 
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At low masses, the mean occupation function for satellite galaxies drops below a power law 
extrapolation of (A^sat)Af from high masses. Similar to K04, we find that the full range of (iVsat)M 
can be well approximated by the form 

(iVsat)M = [(M - MQ)/M[f (3) 

for M > Mo, where the truncation mass Mq for satellites may differ from the truncation mass Mmin 
for central galaxies. Note that with this parameterization, M[ is noi the mass at which (A'sat)M = 1- 
The top panels of Figure 3 show fits of the 3-parameter (Mmin, Mi, a) and 5-parameter (Mmin, 
Clog M, M[, Mo, a) models to the SPH and SA mean occupation functions. The parameter values 
are listed in Table 1. Note that these values are likely to depend on the cosmological model as 
well as the galaxy space density and the physics encoded in the SPH and SA calculations. The 
3-parameter form (used by Zehavi et al. 2005) captures the results of both models well but not 
perfectly, while the 5-parameter form gives a nearly perfect fit in both cases. 

If we assume that central and satellite galaxies have nearest-integer and Poisson distributions, 
respectively, then we can calculate all the higher moments of the occupation number based on 
the fits to the mean occupation. In particular, we can predict average numbers of galaxy pairs 
and triplets inside halos, which are relevant to the 1-halo terms of the galaxy 2-point and 3-point 
correlation functions, respectively. The total occupation number N is the sum of A^cen and Nggi, 
and it is easy to show that 

{N{N - 1)) = (A^cen(A^cen - 1)) + 2(A^cenA^sat) + {Nse,t{Ns^t - 1)) (4) 

and 

(A^(A^ - l)iN - 2)) = {N,,^iN,,^ - l)(Ar,e, - 2)) + S{N,,^{N,^^ - l)N,^t) ... 

+S{N^enNs^t (A^sat - 1)) + (A^sat (A^sat - 1) (A^sat - 2)) . ^ ' 

Since Ncen = or 1, the first term on the right hand side (RHS) of equation (4) and the first 
two terms on the RHS of equation (5) vanish. The first surviving terms represent pairs or triplets 
involving the central galaxy, while the last terms represents combinations that only involve satellites. 
To deal with the cross-correlation between occupations of central and satellite galaxies, we simply 
need to note that if Acen = then Agat = by definition. Therefore, (AcenAgat) = (Agat) and 
(AcenAsat(Asat - 1)) = (Asat(Asat - !))• Assuming a Poisson distribution for Agat, equations (4) 
and (5) reduce to 

(A^(A^ - 1)) = 2(A^sat) + (A^sat)' (6) 

and 

{N{N - 1){N - 2)) = S{N,,tf + (A^sat)'. (7) 

We plot predictions based on these two equations and the fits to (Acen)M and (Asat)M in the 
middle and bottom panels of Figure 3. Points in these panels are measurements from the SPH and 
SA calculations. We see that the 3-parameter model overpredicts the number of galaxy pairs and 
triplets in low mass halos, mainly because the number of satellites is overpredicted by ignoring the 
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profile of the low mass cutoff. For the 5-parameter fit to {N)m, the predicted numbers of pairs 
and triplets agree remarkably well with the measured values, providing further evidence that the 
Poisson approximation for -P(iVsat|(iVsat)M) is adequate for practical calculations. 

3.2. HOD for Young and Old Galaxies 

The top panels of Figure 4 show mean occupation functions for the young and old halves of the 
fig = 0.02/i^Mpc~^ sample, in comparison to {N)m of the full sample. Solid curves in the bottom 
panels show the fraction of young galaxies as a function of halo mass. These are basically the same 
as in Figure 13 of BOS. While the mean occupation number of old galaxies rises continuously as 
halo mass increases, the {N)m of young galaxies first rises, then declines to a local minimum at 
M ^ IO^'^A/q, then rises again. Sheth & Diaferio (2001) find similar non-monotonic behavior for 
the mean occupation of blue galaxies in the SA models of Kauffmann et al. (1999). The fraction 
of young galaxies (lower panel) has a steep drop at low occupation number and decreases slowly 
toward higher occupation number. 

The shapes of these mean occupation functions are easy to understand if we separate contri- 
butions from central and satellite galaxies, as shown in the middle panels of Figure 4. As shown 
by BOS (their Fig. 19), both the SPH simulation and the SA model predict that on average the 
(median/mean) stellar age of a halo's central galaxy is an increasing function of halo mass. As 
explained by BOS, this correlation between stellar age of the central galaxy and the mass of the 
parent halo arises from two physical effects: higher mass halos begin to assemble earlier, allowing 
an earlier onset of star formation in the central galaxy, and gas accretion rates drop once a halo 
becomes massive enough to support a virial shock (Keres et al. 2004) , choking off the formation of 
young stars at late times. In halos of mass near Mmin, therefore, central galaxies are usually young 
(i.e., below the sample's median age), while in high mass halos they are all old. The minimum in 
{N)m of young galaxies occurs for halos that are too massive to host young central galaxies but 
not massive enough to host satellite galaxies. The typical age of satellite galaxies also increases 
with halo mass, but the correlation is weaker (see BOS, Fig. 19) because satellites experience most 
of their growth in lower mass halos that merge into their final host halo. In Figure 4, the young 
galaxy fraction for central galaxies drops rapidly with increasing halo mass and falls essentially to 
zero, while the young galaxy fraction for satellites drops more slowly and in the SA calculation 
reaches a nearly flat plateau at high mass. The different HODs of young and old galaxies reflect 
fundamental aspects of galaxy formation physics, and their origin is transparent once we separate 
central and satellite contributions. 

As with the full sample HODs, we would like to find simple parameterized prescriptions that 
capture these age-dependent features, which we can do by describing the young galaxy central and 
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satellite fractions as a function of halo mass. Based on the bottom panels of Figure 4, we adopt 



/young,cen — fc 



1 + exp 



log M - log Mc^ ^ 



(8) 



where fc sets the amplitude and Mc and aiog characterize the transition mass scale and speed. 
We approximate the fraction of young satellite galaxies by an exponential, 

_ /_logM-bgMo\ 

/young,sat — Js 6Xp I I , (y) 

where Mq is the same satellite cutoff mass used in equation (3) and cJiog Ms characterizes the speed 
of the falloff. Top panels of Figure 5 show that the above functional forms for the fig = 0.02/i^Mpc~^ 
samples, together with our 5-parameter model for the full galaxy {N)m, allow accurate fits to the 
SPH and SA mean occupation functions. The parameters we use for the SPH(SA) model are fc = 
0.71(0.65), log Mc = 12.55(12.64), aiog m, = 0.26(0.14), /, = 0.99(0.80), and aiog = 1.50(1.10). 

To predict numbers of galaxy pairs and triplets, we need additional assumptions. We cannot 
automatically simplify the central-satellite cross terms of equations (4) and (5) to obtain equa- 
tions (6) and (7) because a halo may, for example, have an old central galaxy (A^cen,young = 0) but 
nonetheless have young satellites. To proceed, we assume that the satellite population in a halo of 
a given mass does not depend on the age of the central galaxy, and that the occupation numbers 
of young and old satellite galaxies follow independent Poisson distributions with respect to their 
individual means. Equations (6) and (7) are then modified by changing the first RHS terms to 
2(A^cen)(A?sat) and 3(A'cen)(A'sat)^, respectively. The middle and bottom panels of Figure 5, which 
compare pair and triplet predictions from the mean occupation fits to the SPH and SA results, 
suggest that the above approximations are accurate enough for calculations of galaxy clustering. 

Figure 6 shows HODs for samples defined by a bin of baryonic mass instead of a threshold; 
specifically, we take the less massive half of our fig = 0.02/i^Mpc~^ samples, and we again divide 
into young and old subsamples. The shapes of the mean occupation functions for all, young, 
and old galaxies look alike — a bump at lower halo masses resulting from central galaxies and an 
approximately power-law function toward higher halo masses representing the satellite contribution. 
Guzik & Seljak (2002) find a similar result for a luminosity-bin sample using the Kauffmann et al. 
(1999) SA model. For our chosen mass bin, the majority of central galaxies are young, and (since the 
young and old samples are roughly equal) the majority of satellites are therefore old. Lower mass 
bins have a higher fraction of young central galaxies, while high mass bins have predominantly old 
central galaxies. The 3-parameter and 5-parameter models described in § 3.1 are not appropriate 
for describing the HOD of a mass or luminosity bin because they do not allow an upper cutoff 
in {Nccji)m- However, these models do provide good descriptions for samples defined by mass or 
luminosity thresholds, and the HOD of a bin sample is just the difference of the HODs for thresholds 
at the bin's upper and lower limits by mass or luminosity thresholds, and the HOD of a bin sample 
is just the difference of the HODs for thresholds at the bin's upper and lower limits. 



3.3. HOD Parameters as a Function of Galaxy Mass/Luminosity 



As the threshold baryonic mass is increased, the predicted mean occupation function maintains 
the same general form but shifts horizontally towards higher masses in the log(A^)M^log M plane 
(BOS, Fig. 9). Figure 7 examines the dependence of characteristic halo mass scales for central 
and satellite galaxies on the baryonic mass threshold. The lower solid curves represent Mmin, the 
characteristic minimum mass for hosting a central galaxy above the threshold. Since {Nccn)M is 
not a strict step function, we simply define Mmin in this figure as the mass at which {N)m = 0.5; 
half of the halos with M = Mmin contain a central galaxy above the threshold, and half do not. 
This halo mass threshold is linearly proportional to the baryonic mass threshold, Mmin oc Mgai, for 
Mmin 3 X 10^^ Mq, implying that the baryonic mass of the central galaxy is proportional to the 
mass of the host halo in this regime. The dot-dashed line in each panel marks Mmin = (^6/^m)Mga,i, 
expected if the ratio of central galaxy mass to halo virial mass is equal to the universal baryon 
fraction. The SPH results lie close to this limiting case for low halo masses, while the SA central 
galaxies never accrete more than about 25% of the available baryons. As discussed by BOS, this 
difference can be attributed to treatments of the gas core radius and stellar feedback in the SA 
model, which are adjusted to fit the observed galaxy luminosity function. The SPH simulation 
predicts excessively massive galaxies, assuming a standard stellar initial mass function (IMF). 
Additional processes such as photoionization by the UV background or supernova gas blowout may 
suppress accretion in very low mass halos and produce an upturn in the Mmin-Mgai relation, but 
they are not evident in the mass range probed here. 

In high mass halos, some of the baryonic mass goes into satellite galaxies, and the efficiency 
of gas cooling drops so that a smaller fraction of baryons are accreted onto galaxies (BOS, Fig. 5). 
As a result, Mmin begins to grow faster than Mgai at masses Mmin ^ 3 x 10^^ M0, corresponding 
to group or cluster halos. The SA model predicts a much steeper rise than the SPH simulation, a 
consequence of resolution effects in the SPH simulation and of the different ways the two models 
treat cooling and feedback. The SPH simulation includes thermal feedback from supernovae, but 
the thermal energy is usually deposited in dense gas and radiated away before it can drive a galactic 
wind (Katz, Weinberg, & Hernquist 1996). This relatively efficient cooling produces overly luminous 
galaxies at the centers of groups and clusters (again assuming a standard stellar IMF) , though the 
baryonic masses and hence luminosities are reduced somewhat when the numerical resolution is 
increased (see BOS, Fig. 5). In the SA model, the formation of very bright galaxies is suppressed by 
adjusting the core radius in the gas density profile, which controls how much of the gas can cool, 
so that the model matches the bright end of the observed galaxy luminosity function (Cole et al. 
2000). The core radius mechanism is somewhat artificial, and the "solution" in the real universe 
probably involves some additional physics such as thermal conduction or AGN-driven superwinds 
(see Benson et al. 200Sa) or reduced cooling efficiency in multi-phase halo gas (Mailer &; Bullock 
2004). However, the trend of Mmin with Mg^i is likely to resemble the SA curve shown here for any 
model that matches the observed galaxy luminosity function, since the halo mass function is fixed 
by the cosmology and the luminosity function by observations. 
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The upper solid curves in each panel of Figure 7 show the dependence of Mi on galaxy baryonic 
mass, where Mi is the mass of halos that on average contain one satellite galaxy (i.e., {N)m = 
2) above the baryonic mass threshold. These curves look remarkably like scaled versions of the 
Mmin curves. The dashed curves represent 14Afinin and ISMmin for the SPH and SA models, 
respectively, and they match the Mi curves almost perfectly except for very massive halos. The 
smaller scaling factor in the SPH simulations is at least partly a consequence of the more efficient 
cooling onto central galaxies, discussed above, which leads the simulation to produce galaxies that 
are increasingly overluminous as halo mass increases. The large gap between Mmin and Mi, more 
than an order of magnitude in both calculations, produces the plateau in the mean occupation 
function between {N)m ~ 1 and {N)m ~ 2. As discussed by BOS, a halo that is only a few times 
Mmin usually "spends" its extra baryons building a larger central galaxy, instead of making two 
galaxies of comparable mass. A halo has to be ~20 times more massive than Mmin before it has a 
high chance of merging with a halo massive enough to contribute a satellite above the threshold. 

For N-body subhalos analyzed in K04, the scaling factor between Mmin and Mi is about 25 at 
z = (and becomes smaller at higher z, e.g., ^^10 at z = 3). The modest difference between our 
results and theirs may be caused partly by slight differences in defining Mmin and partly by the 
difference in the cosmological models adopted in the calculation. K04 also find that the high mass 
slope (a) of the satellite mean occupation is very close to one, independent of the mass threshold. 
We find similar results through our fits to the satellite occupation functions for galaxy samples 
with different baryonic mass thresholds (see Table 1). 

4. Relations betw^een Galaxy Mass/Luminosity and Halo Mass 

4.1. Conditional Mass/Luminosity Function as a Function of Halo Mass 

The conditional luminosity function (CLF; Yang, Mo, &: van den Bosch 2003) is the luminosity 
function $(L|M) of galaxies that reside in halos of a given mass M. The CLF encodes the luminosity 
dependence of the mean occupation function, since a complete characterization of $(L|M) allows 
one to examine the M-dependence for any specified range of L. This formalism has been used 
to model observations of the 2dFGRS and the DEEP2 redshift surveys by simultaneously fitting 
measurements of the galaxy luminosity function and luminosity dependent clustering (Yang, Mo, 
& van den Bosch 2003; van den Bosch, Yang, & Mo 2003a; van den Bosch, Mo, & Yang 2003c; 
Yan, Madgwick, & White 2003). A derived or assumed CLF can be used to construct mock 
catalogs, allowing more detailed tests of the CLF and cosmological model (e.g.. Mo et al. 2004; 
Yan, White, &: Coil 2004; Yang et al. 2004). The spirit of CLF modeling is similar to that of fitting 
parameterized HOD models like those of §3 to observed clustering in different luminosity ranges, 
as undertaken for the SDSS by Zchavi et al. (2005) (also see Seljak et al. [2005] for an analysis 
using galaxy-galaxy lensing measurements). In HOD fits, information about the galaxy luminosity 
function enters through the number density constraints. The CLF approach attempts to solve for 
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the entire luminosity dependence of the HOD at once, with the consequence that it requires a 
more detailed parameterized form for model fitting. The above mentioned papers usually assume 
that the CLF follows a Schcchtcr (1976) function at each halo mass, and they parameterize the 
halo mass dependence of the normalization, faint end slope, and characteristic luminosity of this 
Schechter function with various functional forms. The central galaxy in each halo is chosen to be 
the most luminous one of the galaxies drawn from the CLF. 

Since our galaxy sample is constructed based on a baryonic mass threshold, here we first 
examine the theoretical predictions for conditional galaxy baryonic mass functions, which we refer 
to as the CMF (note that M here represents galaxy baryonic mass, not halo mass). The CMF 
should be similar to the CLF, though the systematic mass dependence and scatter of stellar mass- 
to-light ratios will stretch and smooth the CLF somewhat relative to the CMF. The qualitative 
results for the SPH simulation are similar to those for the SA model except for the overall shift in 
baryonic mass scale. We concentrate on the SA model because it has better statistics and because 
it better matches the observed galaxy luminosity function. The SA model also predicts galaxy 
luminosities, and Benson et al. (2003b) have presented SA calculations of CLFs. We compare CLF 
and CMF results later in this section. 

Figure 8 plots the CMF in bins of halo mass running from logM = 11.6dr0.1 to logM = 
14.6 lb 0.1. Solid histograms show the full CMF, dashed histograms the contribution of central 
galaxies only, and points with Poisson error bars the contributions of satellite galaxies only. Since 
we consider only galaxies with baryonic mass Mgai > 1.45 X 1O^°M0, the CMF in halos of mass 
M < IO^^Mq is completely dominated by central galaxies; the two contributions become equal at 
halo mass M lO^^M©, where {Nsat)M = 1 (Fig- 7). The central galaxy CMF is sharply peaked 
at a fixed halo mass, and it can be approximated by a Gaussian in logMg^i. Solid lines in Figure 9 
show the parameters of such Gaussian fits as a function of M. The dispersion is o" ~ 0.13 over a 
wide range, and much of this width reflects the ±0.1 size of our log M bins. The dispersion rises at 
low and high halo masses. At low M, the mean central galaxy mass is linearly proportional to halo 
mass, while the relation at high mass is much shallower, approximately Mgai oc M^/^. The ratio of 
the integrated central galaxy CMF to the total CMF follows {Ncen) M / {{Ncen) M + (-/Vsat)M) once 
the baryonic mass scale of the central galaxies is well above the threshold baryonic mass, which 
happens in halos more massive than ~ 2 x 10^^ M0. 

The satellite galaxy CMF rises monotonically towards lower Mg^i at any halo mass, and it 
can be approximated by a Schechter function that is truncated at the middle of the Gaussian 
representing the central galaxy CMF. Thin solid curves in Figure 8 show a model in which the 
satellite contribution is a sharply truncated Schechter function with = —1.5 and M*^^ = IO^^Mq 
at every halo mass^ and the central galaxy contribution is a Gaussian in logMgai with constant 
width a = 0.13. The mean of the Gaussian, shown by the dashed line in the middle panel of 



^Note that we use as to distinguish the Schechter function slope from the slope a of (iVBat)M- 
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Figure 9, is 



1 



1-1/7 



/x(M) = fit 



1 + c 



1 + c 



(10) 



where fit = 5.9 x IO^^Mq, Mt = 2.3 x lO^^M©, c = 0.49, and 7 = 5.8. The normahzation of the 
truncated Schechter function is determined by matching the average number of satelhte galaxies 
above the baryonic mass threshold in each halo mass bin, and the central galaxy CMF integrates 
to unity by definition (i.e., we assume that all halos contain a central galaxy of some mass, though 
it might be below our adopted threshold). The top panel of Figure 9 shows the relative amplitudes 
of central and satellite CMFs evaluated at the mean of the Gaussian component. Even though the 
relative normalization is the only parameter adjusted on a bin- by-bin basis, this global fit describes 
the SA model CMF quite well over the full range of halo masses plotted in Figure 8. 

With the CMF/CLF, one can populate halos from simulations to study various properties of 
galaxy clustering. For a halo of a given mass, one chooses the baryonic mass or luminosity for the 
central galaxy according to the central galaxy CMF / CLF and does similarly for the satellites. This 
procedure implicitly assumes that there is no correlation between the masses/luminosities of central 
galaxies and satellites in halos of fixed mass. Figure 10 tests the validity of this assumption by 
comparing satellite CMFs at different values of central galaxy baryonic mass in three narrow bins 
of halo mass. The central and satellite CMFs are normalized so that there is one central galaxy 
per halo. For IO^^Mq halos, there is a clear trend for a lower amplitude satellite CMF in halos 
with a more massive central galaxy. The trend is present but weak in the more massive halos. This 
trend could be a consequence of galactic cannibalism, with more massive central galaxies growing 
by consuming more satellites. Note that in a small fraction of halos, the most massive satellite 
exceeds the mass of the central galaxy. While the SA model always places the more massive galaxy 
from a pair of merged halos at the center of the new halo, that galaxy may be more gas rich than 
its partner, in which case feedback from star formation can leave it slightly lower mass in the end. 

A clear consequence of the central galaxy contribution is that the CMF at a given halo mass 
cannot be well approximated by a Schechter function, especially at intermediate masses IO^^'^Mq < 
M < 10^^ Mq where the central galaxy "bump" rises well above the extrapolation of the satellite 
CMF. At high M the fractional contribution of the central galaxy is small, and a Schechter function 
is a reasonable approximation for most purposes. At low M, where central galaxies dominate, the 
Gaussian CMF can be roughly approximated by a Schechter function with > 0. Yang, Mo, 
& van den Bosch (2003) indeed find > at low masses in their (Schechter-based) fits to the 
2dFGRS conditional luminosity function (see their eq. [20] and Table 1). Note, however, that if 
we lowered our galaxy mass threshold we would pick up lower mass (fainter) satellites in low mass 
halos, and a Schechter form would again become a poor description. 

Benson et al. (2003b) study the CLF in SA models and find a similar trend to that seen here 
in the CMF: in low mass halos the CLF has a "hump" at the bright end and a roughly power-law 
shape at the faint end, and in massive halos the "hump" disappears and the CLF approaches the 
Schechter form (see their Fig. 1). Figure 11 presents similar results for the cosmology and version of 
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GALFORM adopted here, in Sloan g and r bands. We truncate the histograms at Mg = —18.75 and 

Mr = —19.25, since scatter in the mass-luminosity relation would make our mass-limited galaxy 
samples incomplete at fainter magnitudes. The central galaxy hump is much broader than in the 
CMF, and it therefore merges more continuously into the total CLF, though it is still visible in the 
low mass histograms. While the total CLF roughly resembles a Schechter function with a faint-end 
slope that changes steadily with halo mass, our results suggest that a better model for empirical 
fitting would be a suitably generalized version of the one used in Figure 8, with parameterized 
forms of {Nsa.t) M / {Nccn) M and l-t{M), and freedom (perhaps mass dependent) added to the width 
a of the central galaxy CLF and the Schechter parameters of the satellite CLF. Yang, Mo, & van 
den Bosch (2003) considered a model of this form in their first paper on the CLF and showed that 
it yields a good fit to the 2dFGRS data, though they have mostly adopted the Schechter form in 
their later work. 

How does separating central and satellite galaxies change our view of the overall galaxy mass 
function, and hence the luminosity function? Figure 12a shows the galaxy baryonic mass function 
(MF) of the SA model and the individual contributions of central and satellite galaxies. Both 
of these contributions, and the total MF, can be reasonably well fit by Schechter functions, even 
though the CMF at fixed halo mass cannot. Central galaxies dominate the total MF at every mass; 
the satellite contribution is ~ 37% in our lowest mass bin (corresponding to luminosity ~ 0.18L*) 
and < 18% for Mgai > IO^^Mq (L > 1.20L*). The dominance of central galaxies is a consequence 
of the large gap between the minimum halo mass Mmin for hosting a central galaxy and the mass 
Ml ~ ISMmin required to host a satellite of the same mass. The more massive halos are rarer, 
especially in the exponential cutoff region of the halo mass function, so even though a massive 
halo can host multiple satellites, the central galaxies of abundant, lower mass halos dominate by 
number. Figure 12b amplifies this point by dividing the MF into contributions from different 
halo mass ranges. In observational terms, most galaxies are found in group environments, but 
these groups (at least if defined at an overdensity ~ 200) typically have a central galaxy that is 
substantially brighter than its neighbors. If one chooses a random galaxy of a given luminosity 
(especially with L > L*), then it is more likely to be the dominant galaxy of its own group rather 
than the satellite of a more luminous system. 

Our results on the decomposition of the MF are in agreement with those found for the LF by 
Benson ct al. (2003b); see their Fig. 3. They also find that, except for very low mass halos (where 
the photoionization suppression of galaxy formation becomes important), the faint end slope of 
the CLF is steeper than that of the overall LF, ag ~ —1.5 versus ag ~ —1.2. The shallower 
faint end slope of the overall LF reflects the contribution of central galaxies. Our results for the 
galaxy MF are also similar to results found by other authors for the MF of dark matter subhalos 
in high resolution A^-body simulations. Vale & Ostriker (2004), based on the simulations of Weller, 
Ostriker, & Bode (2004), propose that the satellite subhalo mass function in a given halo can be 
fit by a Schechter function with low mass slope ~ —1.91, which is consistent with the power-law 
slope ~ —2 found by De Lucia et al. (2004). Compared with their results, we find a shallower 
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low mass slope {as ~ —1-5), probably because of the difference between dark matter and baryonic 
mass. Figure 1 of Vale &: Ostrikcr (2004) shows the overall mass functions of (satellite) subhalos 
and parent halos, which conveys similar information to the lower left panel of our Figure 12. 

4.2. Mass Distribution of Host Halos for Galaxies at Fixed Mass/Luminosity 

Orthogonal to the CMF/CLF is the conditional mass distribution of halos hosting galaxies 
at fixed baryonic mass or luminosity. This conditional distribution is just the product of the 
halo mass function and the mean occupation function of such galaxies (see Figure 6 for the mean 
occupation function for a sample of galaxies in a bin of baryonic mass). This quantity is relevant 
to the interpretation of galaxy-galaxy lensing measurements or to other methods of estimating the 
average mass distribution surrounding galaxies of specified properties. 

Figure 13 shows the SA model predictions for the conditional mass distribution of host halos 
in narrow bins of galaxy baryonic mass (top panels) and r-band luminosity (bottom panels). The 
distribution is separated into contributions from central and satellite galaxies as well as blue and 
red galaxies. The division of blue and red galaxies is based on a color cut at g — r = 0.734, which 
results in roughly equal numbers of blue and red galaxies. For galaxies of fixed baryonic mass, the 
host halos span a wide range of masses, with a peak at low masses contributed by central galaxies 
and a fairly fiat tail to high masses from satellites. The host halos of red galaxies, whether central 
or satellite, tend to be more massive than the host halos of blue galaxies. As the galaxy baryonic 
mass increases, the contributions to the conditional halo mass distribution from satellite galaxies 
and from blue galaxies decrease, as expected. 

As a function of galaxy luminosity, the conditional mass distribution of host halos has a 
trend similar to that with baryonic mass. However, the scatter between galaxy luminosity and 
baryonic mass broadens the distribution, especially the central galaxy peak, and it leads to greater 
separation between the typical halo masses of blue and red central galaxies. This result is analogous 
to the finding of Mandelbaum et al. (2004) , who assign luminosities to subhalos identified in high- 
resolution dissipationless simulations and show that scatter between subhalo circular velocity and 
galaxy luminosity makes the halo mass distribution at fixed luminosity wider. The width and 
asymmetry of the conditional mass distributions implies, first, that the mean halo mass estimated 
for a given type of galaxy can be very different from the typical mass of an isolated halo that hosts 
such a galaxy (i.e., from the location of the central galaxy peak in these histograms). Second, 
it suggests that interpretation of galaxy-galaxy lensing measurements will be simpler if one can 
classify galaxies by baryonic mass instead of luminosity and if one can separate isolated galaxies 
from those that are likely to be satellites in more massive halos. 



5. Summary and Discussion 



A number of previous studies have investigated the halo occupation function P{N\M) pre- 
dicted by semi-analytic models, hydrodynamic simulations, and high-resolution N-body simulations 
(Kauffmann, Nusser, & Steinmetz 1997; Benson et al. 2000; Seljak 2000; Scoccimarro et al. 2001; 
Sheth & Diaferio 2001; White, Hernquist, k Springcl 2001; Yoshikawa et al. 2001; Cooray & Sheth 
2002; Guzik & Seljak 2002; Scranton 2003; B03; K04), showing good agreement in the qualitative 
features predicted by the different methods. Here wc have followed the lead of Guzik &: Seljak 
(2002) and K04 by separating the contributions of central and satellite galaxies to P{N\M). When 
we consider galaxy samples defined by thresholds in baryonic mass, analogous to observational 
samples defined by thresholds in luminosity, our results for the Cole et al. (2000) SA model and the 
Dave, Katz, & Weinberg (2002) SPH simulation are similar to those found by K04 for subhalos in 
high resolution N-body simulations. In particular, the separation of central and satellite galaxies 
naturally explains the general shape of the mean occupation function {N)m and the transition 
from sub-Poisson fluctuations in P{N\M) at low N to roughly Poisson fluctuations at high N. 
The correlation between halo mass and central galaxy mass is tight, so {N)m for central galaxies 
rises sharply at a threshold mass Mmin and can be approximated by a step function. A halo has 
either zero or one central galaxies, so the width of P(A'ccn|-^) is sub-Poisson by definition. The 
mean occupation of satellites has a roughly power law form, {Nsat)M ~ (M/Mi)" with a ~ 1, 
and the fluctuations of P(A'sat|-^) about the mean arc close to Poisson. However, a halo must be 
~ 10 — 20Minin before it hosts on average one satellite galaxy above the baryonic threshold; in the 
mass decade above Mmin, a larger halo typically hosts a larger central galaxy instead of multiple 
galaxies above the threshold (BOS). In the halo mass regime where central galaxies make a signif- 
icant contribution to the total galaxy counts, the mean occupation rises slowly, and the width of 
the total P{N\M) is substantially narrower than a Poisson distribution. 

When a sample of galaxies above a baryonic mass threshold is divided in two on the basis of 
stellar population age, the HODs of young and old galaxies are markedly different. Halos near the 
cutoff mass Mmin tend to host young central galaxies, while more massive halos host old central 
galaxies. The mean occupation function of young galaxies exhibits a local minimum at halo masses 
M ~ lOMmin, where halos are too massive to have a young central galaxy but not massive enough 
to have satellites. The old galaxy population has a monotonically rising {N)m, and the number of 
old satellites in massive halos is larger than the number of young satellites. The statistics of old and 
young satellites are consistent with Poisson distributions about their respective means, uncorrelated 
with the age of the central galaxy. If one starts with a sample of galaxies in a bin of baryonic mass 
(instead of a mass-threshold sample), then the halos that host central galaxies occupy a relatively 
narrow mass range, and the ratio of old central galaxies to young central galaxies depends on the 
galaxy mass bin. For a low mass (or low luminosity) sample, most central galaxies are young, and 
the sample's older galaxies are therefore satellites in higher mass halos. With a higher mass (or 
luminosity) sample, more of the old galaxies are central objects. These differences in the HODs of 
young and old galaxies naturally explain much of the observed dependence of galaxy clustering on 
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color, spectral type, and morphology, with red/early- type galaxies exhibiting stronger correlations 
and residing in higher density environments. 

Two striking features of the theoretically predicted HODs are the near-constant ratio and 
large gap between the minimum host halo mass Mmin and the mass Mi at which an average halo 
hosts one satellite above the baryonic mass threshold. The ratio is Mi/Mmin ~ 14 for the SPH 
simulation and Mi/Mmin ~ 18 for the SA model, over a wide range in galaxy mass. The large value 
of Mi/Mmin accounts for the extended plateau in {N)m at low halo masses, and it has a number of 
important consequences. First, the conditional (galaxy) mass function (CMF) at fixed halo mass is 
not well described by a Schechter function. Central galaxies produce a "bump" in the CMF at high 
galaxy masses that rises well above the Schechter extrapolation of the satellite population. Scatter 
in the relation between baryonic mass and luminosity smears out but does not eliminate this bump 
in the conditional luminosity function (CLF). Second, at any given galaxy mass or luminosity, the 
total mass/luminosity function is dominated by central galaxies, because the massive halos that can 
host multiple satellites are much less abundant than halos with M ~ Mmin- The satellite fraction 
is larger at low masses/luminosities, where halos with M ~ Mi are not yet on the exponential tail 
of the halo mass function. 

The large value of Mi /Mmin also plays a key role in shaping the galaxy correlation function 
^(r). As first emphasized by Benson et al. (2000), and confirmed in many subsequent investigations 
(e.g., Peacock & Smith 2000; Seljak 2000; Scoccimarro et al. 2001; BerUnd & Weinberg 2002), the 
sub-Poisson width of P(A''|M) at low masses is crucial to reproducing the observed, roughly power 
law form of ^(r), because it enables low mass halos to host large numbers of galaxies without 
hosting large numbers of small separation pairs. As shown here and in K04, this sub-Poisson 
width is a consequence of the nearest-integer statistics of central galaxy occupations, and it holds 
over an extended range of halo masses because Mi /Mmin is large. The roughly constant value 
of Mi/Mjiiin enables galaxies with a wide range of luminosity to exhibit power law correlation 
functions. In detail, HOD models can also explain the deviations of ^(r) from a power law found 
in high precision measurements, as shown by Zehavi et al. (2004). 

The nearly constant value of Mi/Mmin and the nearly Poisson form of P{Nsa,t\M) at fixed M 
are pleasingly simple results, and in qualitative terms they seem intuitively sensible. However, we 
do not have a quantitative explanation for either one of them. An important clue is that K04 find 
essentially the same results for samples of dark matter subhalos defined by circular velocity thresh- 
olds in purely gravitational simulations, which suggests that they emerge mainly from dark matter 
dynamics and that gas physics and star formation serve to light up the dark matter substructures 
in a fairly straightforward fashion. The Mi/Mmin scaling and form of P(A'sat|-^) should then be 
determined mainly by the statistics of halo merger histories, though they will also be affected by 
tidal disruption of subhalos and galaxies and by mergers induced by dynamical friction. Analytic 
methods have been developed to model these processes (Bond et al. 1991; Bower 1991; Lacey &; 
Cole 1993; Bullock, Kravtsov, & Weinberg 2000; Taylor & Babul 2004). A recent theoretical study 
of the formation and evolution of substructures using the methods of Zentner et al. (2005) shows 
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that tidal stripping and disruption are the most important mechanisms shaping the HOD, with dy- 
namical friction sub-dominant (A. Kravtsov, private communication). Further investigation along 
these lines may lead to a more fundamental theory of the galaxy HOD and generic predictions for 
its dependence on redshift and on cosmological parameters. 

Recent observational analyses show qualitative and to some degree quantitative agreement 
with many of our theoretical predictions. Zehavi et al. (2005) show that the projected correlation 
functions of SDSS galaxy samples with a wide range of luminosity thresholds can be well fit by the 3- 
parameter (Mmin, Mi, a) HOD models described in §3.1 for "concordance" values of cosmological 
parameters. The fitted values of Mmin and Mi show the predicted linear scaling relation, with 
Ml Si 23Minin, and the dependence of Mmin on luminosity resembles the predicted dependence 
on galaxy baryonic mass (compare Zehavi et al.'s Fig. 18 to our Fig. 7). The measured ratio 
Ml /Mmin ~ 23 is higher than the value of 14 predicted by the SPH simulation, probably for the 
same reasons that the SPH simulation fails to match the observed galaxy luminosity function. The 
SA model is designed to match the luminosity function fairly well, and its predicted ratio of 18 is 
closer to the measured value (the factor ~25 in the dissipationless simulations of K04 is closer still). 
The remaining discrepancy could largely reflect the difference between a luminosity-threshold and 
mass-threshold galaxy sample, since scatter in stellar mass-to-light ratios allows some galaxies of 
the luminosity-threshold sample to occupy lower mass halos, reducing Mmin and simultaneously 
raising Mi to keep the number density fixed. The difference in cosmological models (we assume 
higher $7^ and lower as than Zehavi et al.) and errors in deriving Mmin and Mi from the clustering 
data with a simplified HOD model can also contribute to the discrepancy. 

Zehavi et al. (2005) also find that the clustering of red and blue subsamples can be well 
described by HOD models like those described in §3.2 for old and young galaxy populations. Fits 

to the blue fraction of central and satellite galaxies as a function of halo mass show good qualitative 
agreement with our predictions (compare Zehavi et al.'s Fig. 22 to our Fig. 4). Van den Bosch, 
Yang, & Mo (2003b) carry out CLF fits to the 2dFGRS data, and their inferred dependence of late- 
type galaxy fraction on halo mass is qualitatively similar to our prediction for blue galaxy fraction. 
Except for the lowest luminosity bin, their inferred mean occupation function for luminosity-bin 
galaxy samples (see their Fig. 10) docs not show a clear bump like that seen in our Figure 6, 
probably because of their assumed Schechter function form of the CLF (see below). 

Zehavi et al. (2005) use their HOD fits as a function of luminosity to infer the conditional 

luminosity function in different ranges of halo mass. Given the good agreement with the predicted 
HOD results, it is not surprising that their inferred CLFs resemble our predicted conditional mass 
functions (compare their Fig. 21 to our Fig. 8). In particular, the large Mi/Mmin ratio leads to a 
central galaxy "bump" in the luminosity function of intermediate mass halos. Indeed, these bumps 
are more prominent than those seen in our conditional luminosity function predictions (Fig. 11), 
perhaps indicating that the SA model used here produces too much scatter in the baryonic mass- 
luminosity relation for central galaxies. Hansen et al. (2004) have measured the CLF directly in 
an SDSS cluster catalog (Bahcall et al. 2003) and find that bumps caused by the brightest cluster 
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galaxies emerge in the CLFs of low richness clusters. They also find that, as a result of the significant 
contributions from the brightest cluster galaxies, a Schechter function is not a good fit for the CLFs 
in many of the cluster richness bins. Eke et al. (2004) look for such central galaxy bumps in the 
CLF of groups and clusters in the 2dFGRS Percolation-Inferred Galaxy Group (2PIGG) catalog 
and do not find them, but they demonstrate using SA mock catalogs that measurement errors in 
the group masses would wash them out. Using a halo-based group finder (Yang et al 2005a), Yang 
et al (2005b) identify groups from the 2dFGRS and measure the HOD as a function of halo mass. 
They find a tight correlation between the mean luminosity of central galaxies and the halo mass, 
qualitatively similar to that shown in the central panel of Figure 9 of this paper. Although they 
obtain reasonably good fits to the CLF over the halo mass range 13 < log[M/ [h~^MQ)] < 14.5 with 
a Schechter function, the bright end of the CLF in halos of \og[M / {h~^ Mq)] ~ 13 starts to show a 
clear enhancement caused by the central galaxies. Lin, Mohr, & Stanford (2004) and Lin & Mohr 
(2004) find that the i^'-band LF of 2MASS cluster galaxies can be fit by a Schechter function if 
the brightest cluster galaxies are excluded. The brightest cluster galaxies seem to follow a different 
distribution and become less important in the total cluster light as the cluster mass increases. They 
also find that the distribution of satellite numbers at a fixed estimated mass is roughly Poisson. 
Miles et al. (2004) study a sample of 25 groups drawn from the Group Evolution Multi-wavelength 
Study (GEMS), and they find prominent bumps at the bright end of the 5-band and i?-band LFs of 
the group galaxies, which are very likely caused by central galaxies. The isolated, luminous. X-ray 
bright ellipticals known as "fossil groups" (Vikhlinin ct al. 1999; Jones et al. 2003) may represent 
an extreme example of the central galaxy bump in intermediate mass halos. All of these results 
are in good qualitative agreement with the predictions here, though more careful replication of the 
observational selection would be needed for quantitative comparison. 

In our decomposed CMFs and CLFs (Figs. 8 and 11), the satellite contributions are fairly well 
described by a truncated Schechter function with constant faint-end slope ag, but the shape of 
the total CMF/CLF changes with halo mass because of the changing relative amplitude of central 
and satellite contributions. Yang, Mo, & van den Bosch (2003) infer the CLF from 2dFGRS data 
assuming a Schechter form at each halo mass, and they find a steadily changing ct^ that becomes 
positive in low mass halos, which might plausibly be a consequence of describing the roughly Gaus- 
sian luminosity function of the central galaxies with a Schechter function. The Schechter-|-Gaussian 
model described in §4 should be a better parameterized form for fitting observational data. It is 
not clear how forcing a Schechter function fit might influence the cosmological conclusions from 
the CLF method (Yang, Mo, & van den Bosch 2003; van den Bosch, Mo, & Yang 2003c), but we 
would expect some impact. For example, a Schechter-based CLF fit that gives the correct halo 
occupations of galaxies will give incorrect occupations, and hence incorrect bias factors, for 2L* 
galaxies, and thus alter the inferred amplitude of mass clustering. Yang, Mo, &; van den Bosch 
(2003) investigate an alternative parameterization for the CLF similar to that advocated here: they 
assume the central galaxy CLF to be a log-normal function for low-mass halos, with the width being 
a free parameter. With this alternative parameterization and a concordance cosmological model, 
they find that the inferred width in the central CLF is more or less consistent with the scatter 
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in the Tully-Fisher relation, and their fits to the global luminosity function and the luminosity 
dependence of the correlation length do not change much (see their Fig. 7). The resulting best-fit 
HOD has noticeable changes, especially at the low halo mass end. This test suggests that the 
assumed CLF form can have a significant impact on inferred halo galaxy populations but may have 
little effect on cosmological constraints derived from the galaxy luminosity function and luminosity 
dependence of galaxy clustering strength. 

The qualitative agreement between our predictions and the results of Zehavi et al. (2005) 
suggests that current theoretical models of galaxy formation are on largely the right track. There 
should be interesting progress in the near future as more clustering measurements are incorporated 
into the observational determinations of the HOD. For example, the group multiplicity function 
places fairly direct constraints on P{N\M) at high halo masses (Peacock & Smith 2000; Marinoni 
& Hudson 2002; Berlind & Weinberg 2002; Kochanek et al. 2003), and void statistics and galaxy 
scaling relations are sensitive to behavior in the cutoff region near Mmin (Berlind Sz Weinberg 
2002), so combinations of these measures with the correlation function can constrain more flexible 
parameterizations of the HOD. Our 5-parameter model provides a near-perfect description of the 
theoretical results and offers a good starting point for fitting observations, but ultimately one 
would like to test the ingredients of this model by allowing more general forms for {N)m, non- 
Poisson statistics for satellites, and so forth. Direct measurements of galaxy profiles in groups 
and clusters can refine the standard assumption (supported by the simulation results of BOS) that 
satellite galaxies trace the underlying dark matter distribution within halos. Detailed empirical 
determinations of the HOD for different classes of galaxies will test theoretical models of galaxy 
formation in greater detail than previously possible, and they may provide guidance to additional 
physics that should be incorporated in these models. 

A still more ambitious goal is to constrain cosmological parameters and the HOD simultane- 
ously using observed galaxy clustering, either on its own or in combination with other observables. 
Based on the CLF approach, van den Bosch, Mo, &: Yang (2003c) have demonstrated that the 
galaxy luminosity function, luminosity dependence of the galaxy correlation length, and mass- 
to-light ratios of galaxy clusters can impose interesting constraints on cosmological parameters, 
either on their own or in combination with CMB data. Abazajian et al. (2005) combine CMB 
data with the projected correlation function of SDSS galaxies brighter than M,. = —21, adopting 
parameterized HOD models like those in §3.1. Cosmological constraints from this combination of 
measurements are as tight as those found by combining CMB data with the large scale galaxy 
power spectrum (e.g., Percival et al. 2003; Spergel et al. 2003; Tegmark et al. 2004), where instead 
of an HOD model one assumes that galaxy bias is scale-independent in the linear and near-linear 
regime. Tinker et al. (2005) show that the SDSS correlation function constraints and observational 
estimates of cluster mass-to- light ratios imply that the values of and/or erg ^ire lower than 
the traditionally adopted "concordance" values of 0.3 and 0.9, in agreement with van den Bosch, 
Mo, & Yang's (2003c) conclusion based on CLF analysis of the 2dFGRS. The HOD approach will 
become more powerful as more clustering measurements are incorporated, especially observables 
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like redshift-space distortions and galaxy-galaxy lensing that are directly sensitive to mass. Even 
with a very flexible HOD parameterization, galaxy clustering measurements alone can pin down 
cosmological parameters like Q„i and erg with reasonable precision (Z. Zheng and D. Weinberg, in 
preparation). In concert with CMB anisotropy, the Lya forest. Type la supernovae, and other ob- 
servables that probe different scales and redshifts, careful modeling of low redshift galaxy clustering 
allows sharpened tests of the nature of dark energy, the masses of neutrinos, and the spectrum of 
density and gravity wave fluctuations that emerged from the early universe. 
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Table 1. HOD Parameters for Galaxy Samples with Different Thresholds of Baryonic Mass 
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Note. — Number density and mass are in units of /i^Mpc"^ and M©, respectively. 
Columns 3-5 are for the 3-parameter model and Columns 6-10 are for the 5-parameter 
model (see the text). For the 3-paramctcr model, Afmin is simply set to be the halo mass 
at which {N)m = 0.5, and Mi and a are obtained through a power-law fit to data points 
with (iVsat)M > 0.1. 
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Fig. 1. — Mean occupation number and scatter as a function of halo mass, separated into central 
and satellite galaxies. Predictions are shown for the fig = 0.02/i^Mpc^^ samples from the SPH sim- 
ulation (left panels) and from the SA model (right panels). Lower panels plot the mean occupation 
numbers of central, satellite, and all galaxies. In the upper panels, circles show {N{N -1))^^^ /{N), 
indicating the width of the probability distribution, for all galaxies (filled circles) and satellite 
galaxies (open circles). For Poisson P{N\M), this ratio would be one (dotted line). This figure can 
be compared to Fig. 4 of K04. 
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Fig. 2. — Probability distributions of satellite numbers as a function of the mean occupation number 
of satellites, predicted by the SPH simulation (left panels) and by the SA model (right panels). 
Points are predictions of the models, and the Poisson error in each bin is assigned as the error bar. 
The dotted histogram in each panel shows the Poisson distribution of the same mean. 




Fig. 3. — Parameterized fits to mean occupation functions (top panels) and predicted numbers of 
galaxy pairs and triplets (middle and bottom panels) for the SPH simulation (left) and the SA model 
(right). For each model, left panels show results based on 3-parameter fits, which assume sharp 
cutoff profiles of {Ncen)M and (A^sat)M, and right panels show results of fits with more parameters 
to model the cutoff profiles (see eqs. [1] and [3]). Fits and predictions are plotted as curves, and 
circles are measurements from the models. 
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Fig. 4. — Age dependence of the HOD predicted by the SPH simulation (left panels) and by the SA 
model (right panels). For each model, the mean occupation functions of old and young galaxies are 

shown in the top panel, contributions from central and satellite galaxies to the mean occupation 
number are plotted in the middle panel, and the fraction of young galaxies (in central, satellite, 
and all galaxies) is plotted in the bottom panel. 
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Fig. 5. — Same as Fig. 3, but for each model the left panels show the young galaxy sample and 
right panels the old galaxy sample. Points show the model results and lines show fits using the 
5-parameter model for all galaxies and the blue fraction parameterization described in the text. 




Fig. 6. — HOD predicted by the SPH simulation (left panel) and by the SA model (right panel) for 
a sample of galaxies in a bin of baryonic mass, containing the less massive half of the full galaxy 
sample. Dotted and dashed curves show young and old galaxy contributions, respectively. 
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Fig. 7. — HOD parameters as a function of galaxy baryonic mass threshold (left panel for the 
SPH simulation and right panel for the SA model). In each panel, the thick solid curve is Mmin, 
the characteristic minimum mass of halos that can host galaxies above a given threshold, which is 
defined here as the mass of halos that have (A^cen)M = 0.5. The two dotted curves denotes masses 
of halos that have {Ncen)M = 0.1 and {Ncen)M = 0.9, respectively. The thin solid curve is Mi, 
which is the mass of halos that can on average host one satellite galaxy above the given threshold. 
The dashed curve indicates 14Mjnin (SPH) or ISAfmin (SA). For comparison, the dot-dashed line 
shows the baryonic mass corresponding to the universal baryon fraction. 
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log[M^yMj 

Fig. 8. — Conditional galaxy baryonic mass functions predicted by the SA model as a function of 
halo mass (the label in each panel is the range of log[M/M0]). In each panel, the CMF is normalized 

in such a way that the number of central galaxies in a halo is unity. The total CMF (thick solid 
histogram) is decomposed into contributions from central galaxies (dashed histogram) and satellites 
(dots with Poisson error bars). The thin solid curve is a sum of a truncated Schechter function 
(dotted curve for satellite galaxies) and a Gaussian function (dotted curve for central galaxies), 
using the parameters shown by the dashed lines in the middle and bottom panels of Fig. 9 and 
further described in the text. 
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Fig. 9. — Parameters related to the CMF in Fig. 8. The middle and bottom panel show the best-fit 
values of the mean n and the width a of the Gaussian function representing the central galaxy 
CMF. For Gaussian fits shown in Fig. 8, an analytic function (dashed curve in the middle panel) 
is used to represent the mean fi, a constant a is adopted (dashed line in the bottom panel), and 
the normalization is set to have one central galaxy in a halo. The top panel shows the ratio of 
the amplitudes of the truncated Schechter function in Fig. 8 (CMF for satellite galaxies) and the 
Gaussian function (CMF for central galaxies) evaluated at the mean of the Gaussian function. 
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Fig. 10. — The satellite CMF as a function of central galaxy mass in three halo mass bins, 
log[M/M0] =13.0 (top), 13.6 (middle), and 14.2 (bottom). For each halo mass, the first three 
panels show satellite CMFs (dotted histograms with points) at different central galaxy masses 
(solid histograms are central galaxy CMFs); satellite CMFs in these three panels are plotted to- 
gether in the right panel for direct comparison. CMFs are normalized such that there is one central 
galaxy in a halo. 
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Fig. 11. — Conditional luminosity functions in gi-band (top) and r-band (bottom) predicted by the 

SA model as a function of halo mass. Similar to Figure 8, the label in each panel marks the range 
of log[M/MQ]. The dotted, thin solid, and thick solid histograms arc CLFs for satellite, central, 
and all galaxies, respectively. The faintest bin of the CLF is set by the luminosity above which the 
completeness fraction is unity (see the text). 
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Fig. 12. — Contributions of central and satellite galaxies to the overall galaxy mass function of 
the SA model. In the lower left panel, the heavy histogram shows the total mass function (in unit 
of /i^Mpc~^dex~^), while lighter histograms show the central and satellite contributions. Dotted 
curves show Schechter function fits. The upper left panel shows the fraction of satellite galaxies 
in each galaxy mass bin. The right panels show the separate contributions from different mass 
ranges, with curves to the right showing central galaxy contributions and curves to the left showing 
satellite contributions. 
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Fig. 13. — Mass distribution of host halos for galaxies at fixed baryonic mass (top) and at fixed 
r-band luminosity (bottom) predicted by the SA model. The label in each panel marks the range 
of log[Mgai/M0] or Mr- Contributions from central and satellite galaxies are represented by thin 
solid and dotted histograms, respectively. They are further divided into red and blue histograms 
for red and blue galaxies with a color cut aX, g — r = 0.734. Distributions are normalized such that 
the total area under the thick solid histogram (for all galaxies) is unity. 
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